For this homework, you can work as a team of size \(\le 5\). You can create a new private GitHub repository for collaboration (need to add @Hua-Zhou and @juhkim111 as collaborators) or re-use the current repository of a team representative. For each question, your report should have a clear description of role of each team member, and Git log should reflect individual contribution to the project.

Q1 Learn by doing

I found the TensorFlow for R Blog series at RStudio quite illuminating. Choose one blog that interests you and do following.

  1. Reproduce the results in the blog.

Solution

Our group searched through the R blog’s website for an interesting project to undertake. After combing through several blogs, our group settled on the Neural style transfer with eager execution and Keras’(https://blogs.rstudio.com/tensorflow/posts/2018-09-10-eager-style-transfer/). The code uses neural style transfer techniques to transmute an original picture into the ‘stye’ of a second picture. The model that the blog utilizes is a simplied version of model VGG19, which is a trained model on ImageNet. The blog modifies the VGG19 model by computing loss function and back propagating it to get the direction of changes that we want. Specific explanations of the model functionality will be provided later in each step. In order to run the the code properly, we modified the original code on the blog since the original code contained errors. We added comments into the code section to explain each section of the code:

#installing necessary packages
install.packages(c("tensorflow", "keras", "tfdatasets"))

library(tensorflow)
library(tfdatasets)
library(keras)
library(purrr)
library(glue)
library(reticulate)
use_implementation("tensorflow")
tfe_enable_eager_execution(device_policy = "silent")

The img_shape determines the resolution dimensions of content_image and style_image. The original blog post uses resolution 128x128. Here we consider resolution as one of our tuning parameters. We will compare and contrast different resolution qualities and its effect on the final trasmuted image.

img_shape <- c(256, 256, 3)

We load the original content image:

library(tensorflow)
library(tfdatasets)
library(keras)
library(purrr)
library(glue)
library(reticulate)
content_path <-"~/biostatm280-winter2019-hw4/Q1/originalRendition/GreatWave/isar.jpg"

content_image <-  image_load(content_path, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

Now we load the style image:

style_path <- "~/biostatm280-winter2019-hw4/Q1/originalRendition/GreatWave/The_Great_Wave_off_Kanagawa.jpg"

style_image <-  image_load(style_path, target_size = img_shape[1:2])
style_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

Create the wrapper that loads and preprocesses the input images. The model that we employ here is VGG19, a network that has been trained on ImageNet.

#create a wrapper
load_and_process_image <- function(path) {
  img <- image_load(path, target_size = img_shape[1:2]) %>%
    image_to_array() %>%
    k_expand_dims(axis = 1) %>%
    imagenet_preprocess_input()
}

deprocess_image <- function(x) {
  x <- x[1, , ,]
  # Remove zero-center by mean pixel
  x[, , 1] <- x[, , 1] + 103.939
  x[, , 2] <- x[, , 2] + 116.779
  x[, , 3] <- x[, , 3] + 123.68
  # 'BGR'->'RGB'
  x <- x[, , c(3, 2, 1)]
  x[x > 255] <- 255
  x[x < 0] <- 0
  x[] <- as.integer(x) / 255
  x
}

Now we explore the layers of the neural network. Within this model, there is no training, but a neural style transfer back propagates the loss to the input layer to get closer to our desired style image. There are two primary layers content and style. The content layer is compared to the style layer via loss function. There are 5 style layers are analagous to level features such as texture, shapes, strokes, etc:

#setting the scene
content_layers <- c("block5_conv2")
style_layers <- c("block1_conv1",
                  "block2_conv1",
                  "block3_conv1",
                  "block4_conv1",
                  "block5_conv1")

num_content_layers <- length(content_layers)
num_style_layers <- length(style_layers)

get_model <- function() {
  vgg <- application_vgg19(include_top = FALSE, weights = "imagenet")
  vgg$trainable <- FALSE
  style_outputs <- map(style_layers, function(layer) vgg$get_layer(layer)$output)
  content_outputs <- map(content_layers, function(layer) vgg$get_layer(layer)$output)
  model_outputs <- c(style_outputs, content_outputs)
  keras_model(vgg$input, model_outputs)
}

This model controls for three types of losses: content, style, and regularization loss.

#define losses
content_loss <- function(content_image, target) {
  k_sum(k_square(target - content_image))
}

gram_matrix <- function(x) {
  features <- k_batch_flatten(k_permute_dimensions(x, c(3, 1, 2)))
  gram <- k_dot(features, k_transpose(features))
  gram
}

style_loss <- function(gram_target, combination) {
  gram_comb <- gram_matrix(combination)
  k_sum(k_square(gram_target - gram_comb)) /
    (4 * (img_shape[3] ^ 2) * (img_shape[1] * img_shape[2]) ^ 2)
}

total_variation_loss <- function(image) {
  y_ij  <- image[1:(img_shape[1] - 1L), 1:(img_shape[2] - 1L),]
  y_i1j <- image[2:(img_shape[1]), 1:(img_shape[2] - 1L),]
  y_ij1 <- image[1:(img_shape[1] - 1L), 2:(img_shape[2]),]
  a <- k_square(y_ij - y_i1j)
  b <- k_square(y_ij - y_ij1)
  k_sum(k_pow(a + b, 1.25))
}

content_weight <- 100
style_weight <- 0.8
total_variation_weight <- 0.01

Here we get the model outputs for content and style images:

#get model output for content and style 
get_feature_representations <-
  function(model, content_path, style_path) {
    
    # dim == (1, 128, 128, 3)
    style_image <-
      load_and_process_image(style_path) %>% k_cast("float32")
    # dim == (1, 128, 128, 3)
    content_image <-
      load_and_process_image(content_path) %>% k_cast("float32")
    # dim == (2, 128, 128, 3)
    stack_images <- k_concatenate(list(style_image, content_image), axis = 1)
    
    # length(model_outputs) == 6
    # dim(model_outputs[[1]]) = (2, 128, 128, 64)
    # dim(model_outputs[[6]]) = (2, 8, 8, 512)
    model_outputs <- model(stack_images)
    
    style_features <- 
      model_outputs[1:num_style_layers] %>%
      map(function(batch) batch[1, , , ])
    content_features <- 
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)] %>%
      map(function(batch) batch[2, , , ])
    
    list(style_features, content_features)
  }

Here we compute the losses for every loss type and gradient for the overall loss with respect to the initial image input:

#compute loss
compute_loss <-
  function(model, loss_weights, init_image, gram_style_features, content_features) {
    
    c(style_weight, content_weight) %<-% loss_weights
    model_outputs <- model(init_image)
    style_output_features <- model_outputs[1:num_style_layers]
    content_output_features <-
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)]
    
    # style loss
    weight_per_style_layer <- 1 / num_style_layers
    style_score <- 0
    # dim(style_zip[[5]][[1]]) == (512, 512)
    style_zip <- transpose(list(gram_style_features, style_output_features))
    for (l in 1:length(style_zip)) {
      # for l == 1:
      # dim(target_style) == (64, 64)
      # dim(comb_style) == (1, 128, 128, 64)
      c(target_style, comb_style) %<-% style_zip[[l]]
      style_score <- style_score + weight_per_style_layer * 
        style_loss(target_style, comb_style[1, , , ])
    }
    
    # content loss
    weight_per_content_layer <- 1 / num_content_layers
    content_score <- 0
    content_zip <- transpose(list(content_features, content_output_features))
    for (l in 1:length(content_zip)) {
      # dim(comb_content) ==  (1, 8, 8, 512)
      # dim(target_content) == (8, 8, 512)
      c(target_content, comb_content) %<-% content_zip[[l]]
      content_score <- content_score + weight_per_content_layer *
        content_loss(comb_content[1, , , ], target_content)
    }
    
    # total variation loss
    variation_loss <- total_variation_loss(init_image[1, , ,])
    
    style_score <- style_score * style_weight
    content_score <- content_score * content_weight
    variation_score <- variation_loss * total_variation_weight
    
    loss <- style_score + content_score + variation_score
    list(loss, style_score, content_score, variation_score)
  }

#compute gradients
compute_grads <- 
  function(model, loss_weights, init_image, gram_style_features, content_features) {
    with(tf$GradientTape() %as% tape, {
      scores <-
        compute_loss(model,
                     loss_weights,
                     init_image,
                     gram_style_features,
                     content_features)
    })
    total_loss <- scores[[1]]
    list(tape$gradient(total_loss, init_image), scores)
  }

Below, the style and content features are calculated just once, but are then iterated over epochs. An output is created every 100 epochs until 500 epochs is reached.

#run the model/training phase
run_style_transfer <- function(content_path, style_path) {
  model <- get_model()
  walk(model$layers, function(layer) layer$trainable = FALSE)
  
  c(style_features, content_features) %<-% 
    get_feature_representations(model, content_path, style_path)
  # dim(gram_style_features[[1]]) == (64, 64)
  gram_style_features <- map(style_features, function(feature) gram_matrix(feature))
  
  init_image <- load_and_process_image(content_path)
  init_image <- tf$contrib$eager$Variable(init_image, dtype = "float32")
  
  optimizer <- tf$train$AdamOptimizer(learning_rate = 1,
                                      beta1 = 0.99,
                                      epsilon = 1e-1)
  
  c(best_loss, best_image) %<-% list(Inf, NULL)
  loss_weights <- list(style_weight, content_weight)
  
  start_time <- Sys.time()
  global_start <- Sys.time()
  
  norm_means <- c(103.939, 116.779, 123.68)
  min_vals <- -norm_means
  max_vals <- 255 - norm_means
  num_iterations <- 500
  
  for (i in seq_len(num_iterations)) {
    # dim(grads) == (1, 128, 128, 3)
    c(grads, all_losses) %<-% compute_grads(model,
                                            loss_weights,
                                            init_image,
                                            gram_style_features,
                                            content_features)
    c(loss, style_score, content_score, variation_score) %<-% all_losses
    optimizer$apply_gradients(list(tuple(grads, init_image)))
    clipped <- tf$clip_by_value(init_image, min_vals, max_vals)
    init_image$assign(clipped)
    
    end_time <- Sys.time()
    
    if (k_cast_to_floatx(loss) < best_loss) {
      best_loss <- k_cast_to_floatx(loss)
      best_image <- init_image
    }
    
    if (i %% 50 == 0) {
      glue("Iteration: {i}") %>% print()
      glue(
        "Total loss: {k_cast_to_floatx(loss)},
        style loss: {k_cast_to_floatx(style_score)},
        content loss: {k_cast_to_floatx(content_score)},
        total variation loss: {k_cast_to_floatx(variation_score)},
        time for 1 iteration: {(Sys.time() - start_time) %>% round(2)}"
      ) %>% print()
      
      if (i %% 100 == 0) {
        png(paste0("style_epoch_", i, ".png"))
        plot_image <- best_image$numpy()
        plot_image <- deprocess_image(plot_image)
        plot(as.raster(plot_image), main = glue("Iteration {i}"))
        dev.off()
      }
    }
  }
  
  glue("Total time: {Sys.time() - global_start} seconds") %>% print()
  list(best_image, best_loss)
  }

Now we execute the entire process!

c(best_image, best_loss) %<-% run_style_transfer(content_path, style_path)
  1. Make your own tweaks. For example, try different tuning parameter values and report what you found, or try a new data set, or apply the method to a new application.

We’ll now explore three types of analyses by resolution, epochs, and style changes.

** Resolution: ** We’ve already seen the original construction of the final transmuted image using a 128x128 resolution. The final image quality was quite poor, making important stylistic features unable to discern.

For the same 500 epoch generated image our 128x128 resolution is:

path1 <-"~/biostatm280-winter2019-hw4/Q1/originalRendition/style_epoch_500.png"

content_image <-  image_load(path1, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

For the same 500 epoch generated image our 256x256 resolution is:

path1 <-"~/biostatm280-winter2019-hw4/Q1/originalRendition/style_epoch_500.png"

content_image <-  image_load(path1, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

Although not a 1 to 1 comparison, for the sake of presentation we now examine the highest possible quality (using our limited GPU processing power) on a different image at 512x512 resolution. Our content image uses the famous painter Albert Biertstadt’s painting of the Rocky Mountains:

path1 <-"~/biostatm280-winter2019-hw4/AlbertBierstadt.jpg"

content_image <-  image_load(path1, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

We transform the original picture with Van Gogh’s Starry night as the style:

path1 <-"~/biostatm280-winter2019-hw4/albert_epoch_500.png"

content_image <-  image_load(path1, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

## Q2 Deep learning on smart phone

Professor May Wang in Department of Community Health Sciences (CHS) studies obesity in children and intervention strategies to prevent obesity. She asked me whether it is possible to develop an app such that a user takes a photo of a meal and the app will recognize and record the type of food (pizza, mac and cheese, burger, …).

Your job: produce a prototype app for iPhone or Android smart phone.

Resources:
1. There are plenty of tutorials and YouTube clips on making apps for iPhone or Android.
2. Google’s Cloud Vision API may supply an easy cloud solution.
3. TensorFlow Lite may provide an easy mobile solution.